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Who am I? 



• Andy Lasko 

• Consulted on dozens of the IC’s Largest OSINT 
Programs 

• 100’s of Private Sector OSINT programs 

• Technical Alliance Manager, Kapow Software 

- Premier OSINT Collection Platform since 1998 

- Booth 205 
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What is OSINT? 



Finding, selecting, and acquiring information 
from publicly available sources and analyzing it 
to produce actionable intelligence. 

• Media : newspapers, magazines, radio, television etc. 

• Web-based communities and user generated content: social-networking 
sites, video sharing sites, wikis, blogs etc. 

• Public Data : government reports, budgets, demographics, hearings, 
legislative debates, press conferences, speeches, marine and aeronautical 
safety warnings, environmental impact statements and contract awards. 

• Professional and Academic: conferences, professional associations, 
academic papers, and subject matter experts. 

• Geospatial Open Source: maps, atlases, gazetteers, port plans, navigation 
data, human terrain data, environmental data, commercial imagery etc. 
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Why Is OSINT The Internet Important? 

The growth of social media, social networking 
sites, media sharing sites, and their ease of access 
through various devices. 

- Whether its riots in Egypt, political protest in Iran or 
terror group recruitment, OSINT provides a relatively 
cheap and immediate form of intelligence for the 
community. 

• Al Jazeera reporter Dan Nolan tweeted during Egyptian 
clashes on 2 February: "Soldiers left 4 tanks outside 
museum. Now anti gov. protestors sitting on top. Main battle 
about 100m further toward gala st.” 

We must collect now! 
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How Good is Our OSINT Capability? 



• Lack Defined Processes 

- Unreliable Data, Sub-Par Processes 

• Lack of Automation 

- Wasted Time, No Re-Use 

• Overwhelmed by Unstructured Content 

- Over focus on Machine Learning and Al 

- Neglecting Structure in Unstructured Enrichment 

- Ignoring Structure to Influence the Enrichment Pipeline 

• Improper Priorities 

- OSINT is a low priority compared to other INTs. 

- Programs invest too heavily on manual efforts 

- Programs focus on making sense of messy collected data 
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OSINT Process Framework 




Language 


Entity 


Entity 


ID 


Extraction 


Resolution 



Geo- 

Tagging 



Translation Ontologies 



Visualization 
& Analysis 
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What Do We Need to Do? 



• Automate the collection process 

• Get more structure into your pipeline 

• Remove noise from the data 

• Improve accuracy of the data pipeline 

• Leverage multiple ontologies 

• Seamlessly discover information across 
structured and unstructured data 

• Crowdsource to improve enrichment 

• Push OSINT services to the people 
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Automate the Collection Processes 



• Deploy On-Line, On-Demand OSINT Services 

- Rapid Service Creation 

• Data is changing, too many sources, changing environment 

- On-Line 

• Leverage these services across the enterprise 

- On-Demand 

• Initiate new data collections 

• Query Enriched Content 

• Evaluate and Refine Processes 

• Invent New Processes 
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Demonstration 
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Finding Structure In the Unstructured 

• Broad Crawls 

- Use common data 

•HI, H2, Metadata tags -title, keywords 

• Targeted URL Crawls 

- Use the HTML tags to find structure on 
targeted crawls 

• Relationships, many to ones, dozens of data 
points 

- Requires an Extraction Browser 

• Always keep raw data 
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Remove Noise From The Data 



• Remove advertising through 
pattern matching 

• Don’t load Noise 

• Crowdsourcing, feedback loops, 
systems that learn based on user 
behavior 

Did you find these results useful? 

Yes No 
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Improve Accuracy of the Data Pipeline 



• Use the Structured Data Points to help the Pipeline’s 
Accuracy 

• Allow the Pipeline to make recursive calls 

- Re-collect or collect new content and call other portions of the 
pipeline as your workflow see’s fit. 

• Trust, trustworthy data, leverage less trustworthy data 

- An OSINT phone number lead to the death of Abu Musab al- 
Zarqawi, former al Qaeda in Iraq leader 

- A Google search on an IP address of interest returned a link to 
GhostNet’s central management console. 

• Teach Your Pipeline Applications 

- NLP technologies have used data collected to learn 
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Leverage Multiple Ontologies 



• Use Ontologies to Influence the Pipeline 

- Human Terrain Mapping Example of a news 
story 

• Allow different perspectives to process 

and evaluate data differently 

- Clearance means something different to 
truck driver than it does to someone in CIA 

- A ‘Tank’ means something different to an 
infantry man than to a logistician. 
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Seamlessly Discover Information Across 
Structured and Unstructured Data 



• One Box Example 



Google 



detroit weather 



Search 



About 42,600,000 results (0.06 seconds) 



Advanced search 



if Everything 

Images 
IK Videos 
3 News 
♦ Shopping 
▼ More 

Forest Glen, MD 

Change location 



Weather for Detroit, Ml 



50°F | °C 

Current: Mostly Cloudy 
Wind: N at 5 mph 
Humidity: 37% 



Thu 





Sun 




51°F | 38°F 47°F|43°F 63°F | 43°F 57 D F | 44°F 



Detailed forecast: The Weather Channel - Weather Underground - AccuWeather 



Detroit. Michigan (48201) Conditions & Forecast : Weather Underground Q. 
Apr 21, 2011 ... Hydraulic Jack’s Weather Detroit. Ml. 50.6 °F, 29 °F. 43%. WSW at 4.7 
mph, 0.00 in / hr. 620 ft. 2:37 PM EDT. Rapid Fire ... 

www.wunderground.com/US/MI/Detroit.html - Cached - Similar 



Source Selection 



Search Other Collections 



Search String: 
Max Results / Source: 
Sources: 



H 



El USDA Forest Service 
OAldo Leopold Research Institute 
El Minerals Management Service 

□ US Government Printing Office 
El US Fish and Wildlife Service 

□ Bureau of Land Management 

El NOAA's National Centers for Coastal Ocean Science 
El National Park Service 

□ National SeaGrant Library 

El US Army Corps of Engineers - Institute for Water Resources 

□ US Geological Survey 
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Crowdsource to Improve Enrichment 



• Enable people to rank the results 

- How accurate is the data 

- Were the right data elements collected 

- Is the Ontology Accurate 

- Is the translation correct 

- Manual Entity Tagging 

- Tag Finders - RSS Feed example of Machine 
Learning 

• Use that Feedback to Improve the Collection 
and Enrichment Pipeline 
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Push OSINT Services to the People 



On-Line, On-Demand OSINT Services Environment 

• Web Services 

• End User Environment Integrations 

- 12, Palantir, Thetus, ESRI, Visual Analytics, Inspire, 
MarkLogic etc. 

• Application Access 

- Data validation, data collection, integration 

• Federated Search 

- Internal, OSINT, Subscription, PKI etc. 

• Browser Plugins 
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Summary 




• We must not miss out on the internet as a 
source for intelligence 

• Analysts must have an interface for 
discovering valuable content and that 
content must be tagged and delivered in a 
manner that supports the knowledge 
discovery process of the analyst. 

• We must start today 
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Contacts 




Booth 205 

• Andy Lasko - Andrew.Lasko@KapowSoftware.conn 

• Brady Balls - Bradv.Balls@KapowSoftware.com 

• 703.489.1445 
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